-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] fix g2 machine memory calculation #14498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving my comments from our discussion yesterday. This looks great and I'm really happy we caught this bug. Just a few comments on the organization.
It would also be great to have some unit testing around the problematic function: gcp_worker_memory_per_core_mib
. Take a look at batch/test/test_utils.py
, it contains a couple unit tests that just test the behavior of a couple of individual functions. You can run that test file with pytest batch/test/test_utils.py
. This would be a good place to add tests for gcp_worker_memory_per_core_mib
. We should cover at least a handful of n1
s and g2
s and some invalid / not-yet-supported machine types that should raise some sort of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great! I think there's one spot that was missed w.r.t. hard-coding n1
s and then a little more cleanup, but we're getting really close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for all your work fixing this sneaky bug!
The memory that machines use is calculated based off of the number of cores the machine has, with a fix ratio set of cores to Gb of RAM. The code assumed a ratio of 3.75 cores: 1 Gb but this assumption does not hold outside of the n1 machine family. This PR changes the functions that calculate memory from number of cores to account for this by passing the machine_type as the main parameter rather than the worker type. Then, logic is added to use the appropriate ratio of 4:1 for the g2 family of machines.